home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98a.txt
/
000129_icon-group-sender _Fri Mar 13 12:35:03 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
4KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id MAA14385
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 13 Mar 1998 12:35:03 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA15705; Fri, 13 Mar 1998 12:35:02 -0700
Date: Fri, 13 Mar 1998 08:38:12 -0700
From: swampler@noao.edu (Steve Wampler)
Subject: Re: Letter Probabilities
To: icon-group@optima.CS.Arizona.EDU
Message-Id: <swampler-9802131538.AA00545533@orpheus.gemini.edu>
In-Reply-To: <35089F74.6018@gte.net>
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Content-Length: 2820
Mark Evans wrote:
> Here is a small Icon problem related to letter probabilites. Each
> letter has a "probability of occurrence" in the information-theoretic
> setting. This probability can be estimated from sample texts. I want
> to generate random text based on these probabilities.
>
> I have a table that associates inidividual letters (one-char strings)
> with real numbers (probabilities). We can assume for the sake of
> argument that the sum of all probabilities in my table is unity.
>
> Given this table (which I already have Icon code to obtain) what is the
> most efficient method of generating random text? What I am thinking of
> at the moment is:
>
> (1) get a sorted list of [key,value] pairs,
> sorted by value (probability),
> highest probability first
>
> (2) generate a random number from 0.0 to 1.0
>
> (3) use a while-loop to find the slot in the
> sorted list where the number falls;
> I would subtract each passing probability
> until my placeholder value had vanished; e.g.
>
> i := 0 # running index
> x := ?0 # random number 0.0 - 1.0
> while x > 0 do
> {
> i +:= 1
> x -:= prob_list[i][2]
> }
> letter := prob_list[i][1]
>
>
>
> This all seems rather awkward to me, especially step (3). Isn't there
> some construct in Icon that could do this more elegantly? Some way to
> search a list for a pair of elements that bracket a variable value?
Hmmm, one *very fast* to produce random text is to build a string from
the characters in your table, where the probability of each character
controls the number of repetitions of that character. Then outputing
n characters of random text is:
every writes(|?s \ n)
Also, you can even simplify the creation of the string by skipping the
building of the table of probabilities. (I realize that you already have
code to produce the table, but this is fun to think about anyway.)
Here is a complete program that reads in a sample text (including
newline characters) and outputs random text based on the probability
of character occurrence in the sample text:
The sample text is assumed to be 10,000,000 characters or fewer
(just for fun - you would probably better off with a more general
approach to reading in the text to remove this arbitrary limit....)
I did it this way to keep the solution small...the setting of &random
could be improved, also.
====================
procedure main(args)
limit := integer(\args[1]) | 10000
&random := map("HhMmSs","Hh:Mm:Ss", &clock)
s := read(,1000000)
every writes(|?s \ limit)
end
=====================
--
Steve Wampler - swampler@gemini.edu [Gemini 8m Telescopes Project (under AURA)]
The gods that smiled at your birth are now laughing openly. (Fortune Cookie)